The Listening Machine: 1st Annual Report

نویسنده

  • Daniel P.W. Ellis
چکیده

In this first year of the project, our work was focused on the problem of identifying and separating specific sound sources in mixtures. The core of our approach is to use prior knowledge about the sounds in the world, encapsulated in some kind of model, to provide the constraints needed to solve the blind separation problem which is otherwise ill-posed. We have looked at using this approach in a reverberant multi-microphone case. In collaboration with Bhiksha Raj of MERL in Cambridge, we looked at setting the parameters of a filter-and-sum beamformer by doing gradient descent on the match between the separated signals and the constrained speech approximation resulting from the model means corresponding to the states of the best-match path found by a speech recognizer [9]. Beam-former parameters and speech recognizer state path parameters can be alternately re-estimated; we found this process to converge successfully after a few cycles. When just a single voice is present, this process amounts to blind estimation of a dereverberation filter. But we were more interested in the problem of multiple overlapping voices, which requires two initial speech recognizer state paths. This required a factorial-HMM model for the

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Special issue on speech separation and recognition in multisource environments

One of the chief difficulties of building distant-microphone speech recognition systems for use in `everyday' applications is that the noise background is typically `multisource'. A speech recognition system designed to operate in a family home, for example, must contend with competing noise from televisions and radios, children playing, vacuum cleaners, and outdoors noises from open windows. D...

متن کامل

NSF - CAREER : The Listening Machine Annual Report 2005

Continuing our broadened theme of machine listening in many contexts, in 2005 we conducted research into automatic extraction of information in complex sound mixtures, in 'personal audio' environmental recordings, from music audio, and for the sounds of marine mammals recorded underwater. 2005 saw the graduation of Manuel Reyes, the Ph.D. student supported by this project from the start. Manuel...

متن کامل

NSF - CAREER : The Listening Machine IIS - 0238301 Annual Report 2007 Daniel

We have continued our research into associating words with the soundtracks of recordings of natural environments. We have been working with a database of 1400 “consumer videos” (collected by our collaborators at Kodak) as well as with similar amateur videos downloaded from YouTube. Based on a provisional lexicon of 25 terms that consumers might use as search terms (“music”, “birthday”, “beach”)...

متن کامل

Report from the BIT’s 1st Annual World Congress of Biomedical Engineering Held in Xi’an, China, 9–11 November 2017

We are delighted to present within this meeting report the abstracts of the "BIT's 1st World Congress of Biomedical Engineering 2017" which has been hold in Xi'an in China [...].

متن کامل

Pancreas Transplantation and Report of 1st one in IRAN

SUMMARY Since 1923, the type I diabetic patients are treating with injections of insulin. Mortality of these patients decreased, comparing with noninsulin using patients, but many of them developed complications of diabetes mellitus, like nephropathy, retinopathy and neuropathy. The choice for treating this diseas and preventing its complications is pancrease transplantation, The 1st pancreas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004